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Abstract 

A scheme is derived for learning connectivity in spiking neural networks. 
The scheme learns instantaneous bring rates that are conditional on the ac¬ 
tivity in other parts of the network. The scheme is independent of the choice 
of neuron dynamics or activation function, and network architecture. It 
involves two simple, online, local learning rules that are applied only in re¬ 
sponse to occurrences of spike events. This scheme provides a direct method 
for transferring ideas between the belds of deep learning and computational 
neuroscience. This learning scheme is demonstrated using a layered feed¬ 
forward spiking neural network trained self-supervised on a prediction and 
classihcation task for moving MNIST images collected using a Dynamic Vi¬ 
sion Sensor. 
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1. Introduction 

Methods in deep learning for training neural networks (NNs) have been 
very successfully applied to a range of datasets, performing tasks at levels 
approaching human performance, such as image classihcation [I], object de¬ 
tection [Ij and speech recognition mu- Along with these experimental 
successes, the held of deep learning is rapidly developing theoretical frame¬ 
works in representation learning Md including understanding the ben- 
ebts of different types of non-linearities in neuron activation functions [8], 
disentanglement of inputs by projecting onto hidden layer manifolds, model 
averaging with techniques like maxout and dropout 0EED! and assisting gen¬ 
eralization through corruption of input with denoising autoencoders EH- 

These types of experimental and theoretical work are necessary to effec¬ 
tively build and understand systems like brains that are capable of learning 
to solve real world problems. Many of the successes of deep learning are a 
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result of a broad inspiration from biology; however, there is a large gap in 
understanding how the principles of deep learning are related to those of the 
brain. Some elements of deep learning may well inspire discoveries in brain 
function. Equally, deep learning systems are still inferior to the brain in 
aspects such as memory, thus efforts to develop models that bridge between 
deep learning and neuroscience are likely to be mutually beneficial. 

The neuron models commonly used in deep learning are abstracted away 
from neuron models that are used in computational neuroscience to model 
biological neurons. Spiking is a salient feature of biological neurons that is 
not typically present in deep learning networks. It is not yet understood 
why the brain uses spiking dynamics; for the purposes of machine learning 
it would be useful to know what if any advantages spiking dynamics confers 
spike based NN learning algorithms over other types of NN learning algo¬ 
rithms, rather than advantages that are otherwise useful in implementing 
algorithms in biology such as energy efficiency and robustness. Dynamical 
systems like spiking networks appear more naturally suited to processing 
continuous time temporal data than state machines, as deep networks are 
usually implemented, but this idea is yet to be demonstrated experimentally 
on machine learning tasks. 

In an effort to bridge this gap in understanding between spike, and non¬ 
spike based NN learning systems, and develop systems for processing event 
based, continuous time data, this paper develops a scheme for learning con¬ 
nectivity in a spiking neural network (SNN). The scheme is based upon learn¬ 
ing conditional instantaneous firing rates, linking it to many of the statistical 
frameworks previously developed in deep learning that are based on condi¬ 
tional probabilities. However, our scheme is fundamentally different to most 
methods used in deep learning as the learning rules are based solely on the 
activity of the neurons in the network and are the same, independent of 
the choice of neuron dynamics or activation function unlike gradient descent 
methods [12], and they can be implemented online and do not require periods 
of statistical sampling from the model unlike energy based methods P3j. In 
addition, the learning scheme is local, meaning that modifying a connection 
only requires knowledge of the activity of the neurons it connects, not neu¬ 
rons from a distant part of the network, unlike gradient descent and energy 
based methods pjH 13]. From a perspective of biological plausibility, this 
means neurons do not have to make assumptions about, or communicate 
to each other their dynamics or activation function and associated param¬ 
eters in order to correctly learn, and the system can be run online without 
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interrupting processing with periods of sampling for learning. 

This paper describes a general scheme for event based learning in SNNs. 
This scheme is demonstrated on a network similar to that commonly used in 
deep learning, specifically, a feedforward layered network architecture with 
rectified linear units, and piecewise constant temporal connectivity. Dropout 
is utilized to show that many ideas from deep learning can be directly im¬ 
ported into SNNs using this learning scheme. An event based dataset of 
moving MNIST digits collected using an DVS camera HHH5JH6] is used to 
train the network for both prediction and classification tasks. 

2. Learning Theory 

We begin by developing a method for learning the connectivity of a super¬ 
vised output neuron. The discussion will be framed in reference to learning 
in a network operating continuously in time with temporally delayed connec¬ 
tivity and temporally encoded input signals since spiking neurons are usually 
modelled as dynamical systems; however, the results are also applicable to 
networks operating in discrete timesteps (as is usual in implementations of 
SNNs with current standard computer architecture), with or without tempo¬ 
rally delayed connectivity and temporally encoded inputs, so they can also 
be applied to traditional artificial neural networks performing static image 
classification, for example. 

Figure [l] shows a general network containing input neurons whose activity 
is determined by an external source, hidden neurons, and supervised output 
neurons. At present we do not assume any particular connection architecture, 
nor do we specify the dynamics or activation functions of the individual 
neurons. 

We first consider learning the input, Q 0 (t), that a supervised output neu¬ 
ron o receives from the network at time t. We need to identify the mathemati¬ 
cal quantity that o should learn - the quantity that Q 0 (t) should approximate. 
Note that Q 0 (t) may be calculated from two sets of quantities only. The first 
quantities are the weights W, that connect the activity of the network to o, 
potentially including self connections and connections that have a temporal 
delay. We call these connections weights as this is the terminology usually 
used in machine learning; however, these connections can also be imagined 
as propagator functions whose value varies with temporal delay. The sec¬ 
ond quantity is the history of activity of the neurons in the network, H, 
which since we are primarily interested in SNNs and event based learning, 
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Figure 1: Schematic of a general neural network consisting of input neurons controlled by 
an external source, hidden neurons and supervised output neurons. Connectivity between 
neurons is unrestricted, directed connections are allowed between every pair of neurons 
including self connections, as are temporally delayed connections. 
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we model as a set of one dimensional Dirac delta functions in time that may 
be normalized to non unit integral to allow a neuron’s spike strength to be a 
real value, instead of only binary as in many SNN models. However, since it 
allows the use of more intuitive terminology of spike rates, instead of spike 
strength rates, we assume that a spike with real valued strength is equivalent 
to a sum of simultaneous unit strength spikes and possibly one partial unit 
spike. Alternatively, this equivalence holds if spike strengths are restricted 
to a unit value and simultaneous spikes are not allowed. In any case the 
mathematical description and quantitative results are unchanged aside from 
a possible conversion function if simultaneous spikes are not considered to 
combine additively into a single real valued spike and vice-versa. 

Assuming W is fixed after learning, the only time varying quantity that 
can be used in calculating Q 0 (t) is H(t), so we re-parametrize Q 0 {t) to Q 0 (H). 
We then propose that a sensible output of the network to o is Q(o\H), the 
mean conditional instantaneous spike rate (during training) of supervised 
output neuron o, given activity H in the network. Integrating Q over a time 
period gives an expected number of spikes. Thus, in the case of a network 
operating in discretized time, as is common in the implementation of artificial 
SNNs in code, Q(o\H) can be interpreted as an expected number of spikes 
o given H , where the integral over a small discrete timestep is understood. 
That is, if we observe activity H in the network n times during training, 
and o spikes n 0 times during timesteps coinciding with those n occurrences 
of H , then after training if we observe H again, the output that should have 
been learnt and produced by the network at that timestep is n 0 /n. If only 
single unit strength spikes are allowed at each timestep, then this output 
can be interpreted as P(o\H), the probability of o spiking during the given 
timestep, given H. This interpretation is important in connecting the focus 
on probability distributions in machine learning with the focus on spike and 
rate coded networks in computational neuroscience HZj. 

Of course if H includes the full history of the network’s activity from 
inception, then only one sample trajectory of H will be observed and used 
for learning. However, we assume that W approaches zero as the connection 
time delay becomes large, meaning that only some recent history of activity 
in the network is used in calculating Q(o\H). Thus, a variety of different 
H will be observed during training. In any practical network W will be 
finitely parametrized, so for any particular H k the parameters modified in 
learning Q(o\H k ) will in effect learn for both a range of other H that are 
slight perturbations of H k and also use the same parameters, as well as very 
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different H that only share a portion of the same parameters. 

If we assume the neurons in the network are spiking neurons, then there 
are only four events that occur within the network at which to apply learning 
rules that modify Q 0 (H). They are (i) when an input neuron spikes, (ii) 
when a hidden neuron spikes, (iii) when a supervised output neuron spikes 
due to supervision, and (iv) when a supervised output neuron spikes due to 
its own dynamics or activation function. Modification could also be made 
continuously at all times, at randomly generated time points, or according 
to a temporally periodic function; however, we proceed concentrating on the 
spiking events. 

Let D be a function that is applied to Q 0 (H) when H occurs (a combina¬ 
tion of events (ii)), and let U be a function that is applied to Q 0 (H) for each 
supervised spike o (an event (i)) that co-occurs with H (see Fig. [2]). After 
some period of training time t, we have a series of iterated applications of D 
and U applied to the initial value Q G 0 (H) 

Ql(H) = (D o U o ... o U) o ... o (D o U o ... oU)o Q°(H), (1) 

where the brackets group operations for each occurrence of H. 

To find a relation between U and D, suppose now that the initial value 
Qo(H) = Q(o\H ) as we desire. Clearly, we also require that Qo(H) ~ 

Q(o\H), meaning that the application of the learning rules D and U do 
not cause the output to significantly deviate from the desired value. Note 
that we cannot require strict equality due to the stochastic nature of the 
event occurrences. If we choose 

U = (2) 

where superscripts indicate composition, not exponentiation and J is an 
unknown function still to be determined, then with the initial value Q®(H) = 

Q(o\H), Eq, 0 becomes 

Qi(H) ={Do D j( o ... o D J(Qt " m Ao...o(D o £>"“■■'">' o ... o £>•'<«'("»') Q[o\H). 

( 3 ) 

Let N be the number of occurrences of H. For Q l 0 {H) ~ Q(o\H), we 
require that all the applications of D and U approximately cancel, i.e. 

+ 0. (4) 
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Figure 2: Schematic showing the timing of events in the operation and learning of the 
network. Neuron spike events are indicated by crosses, learning rule applications are in¬ 
dicated by dots. Vertical dotted lines indicate timesteps of width r used in the network’s 
implementation in code (the network can, given suitable hardware, be operated as a dy¬ 
namical system in continuous time). Note that although variation in the location of spikes 
within a timestep occurs, this variation is not resolved by a time stepped implementation, 
additionally single neurons can spike multiple times within a single timestep. This dia¬ 
gram focuses upon learning for a particular hidden neuron activity pattern Hi indicated 
by red crosses, (i) For output neuron o, U° is applied each time o spikes in conjunction 
with Hi, D° is applied each time Hi is observed, (ii) For input neuron i , each time i 
spikes in conjunction with the beginning of Hi, U{ is applied at the conclusion of Hi as 
Hi must be observed in order to identify the connections to modify. D\ is applied each 
time H[ occurs, (iii) For prediction neuron p, each time p spikes time At after Hi occurs, 
Uf is applied. D\ is applied each time Hi occurs. 
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The expected number of applications of U is NQ(o\H), and we require that 
QoiH) ~ Q(o\H) at all points in this sequence of applications of D and U, 
so we have 

J(Q 0 (H))NQ(o\H) « -N. (5) 

However, since we do not know Q(o\H) a priori we use the network’s current 
estimate Q 0 (H) instead and set 

JmH)) = -QjHY (6) 

Using Eq. (|2]) we now have the following relation between the function 
D that is applied when H occurs, and the function U that is applied when o 
spikes due to supervision 

U = D~^>. (7) 

This requires that D has a unique inverse, and D~ 1 can be generalized in 
such a way as to be applied a fractional number of times. 

In the above we required that Q* (77) ~ Q{o\H) at all points in a sequence 
of applications of D and the U. This implies that any single application of 
either D or U when Q 0 (H ) ~ Q(o\H), can only change Q 0 (H) by a small 
(but not necessarily fixed) amount 


Qo — cu <U ( Q 0 ) < Qo + £[/, ( 8 ) 

Qo — £d < D(Q 0 ) < Qo + Pd, (9) 

which using ([7| leads to the relation 

£d = Qo^U- (10) 


The required range of Q 0 is [0,oo). To ensure that ejj remains small as 
Qo —> 0, we require €d be chosen so that in the lirriQ^o, remains finite. 
Alternatively it would be possible to insert noise spikes, for example Poisson 
noise with rate m into the supervision to fix a minimum target value of Q 0 
to m, hence bounding Q a > 0 and eliminating the divergence in Eq. (10). 
After learning this noise can be stopped and subtracted from the learnt value 
of Q(o\H). In most cases the maximum value of Q will be finite, and hence 
6o and ejj can be chosen to give sufficiently small changes. 

To avoid Q 0 converging to an unwanted value, this learning scheme must 
have only a single globally stable fixed point Q 0 = Q(o\H). This means that 




U(Q) and D(Q) cannot both have fixed points at any Q. We therefore adjust 
Eqs. ^ and (j^[) to 


Qo~ eu <U ( Q 0 ) < Q 0 or Qo <U(Q 0 ) <Q a + tu, 


( 11 ) 


and 


Qo < D(Q 0 ) < Q 0 + e D or Q a - e D < D{Q 0 ) < Q 0 . 


( 12 ) 


We choose between either the two left, or two right options in ( [If] ) and 
(12) by considering the stability of the fixed point Q(o\H) for each of these 
choices. Taking equalities in the above equations and using Eq. © , the 
total change to Q a after N applications of D and an expected Q{o\H)N 
applications of U is 


A « ±NQ 0 e v t Q{o\H)Ne v . (13) 

If Qo > Q(o\H) we require A < 0, and if Q a < Q(o\H) we require A > 0. 
This implies the following choice for our learning rule restrictions 


Qo — £d A D(Qo) < Qo, 

(14) 

Qo < U(Q 0 ) < Qo + €jj, 

(15) 


that is, D slightly decreases Q 0 and U slightly increases Q a . 

3. Application to Learning Layers of Autoencoders 

We now outline a demonstration of this learning scheme. A standard 
method for training an unsupervised deep feedforward network is to train 
each pair of layers successively as autoencoders [6] so that each layer encodes 
the activity of the layer below it, see Fig. [3] The learning rules described in 
Sec. [2] can be used to learn layers of autoencoders by replacing the supervised 
output neuron o , with an input neuron i that self-supervises, and by reversing 
the direction of connectivity so that i learns to output Q(i\H), where H is 
now the future activity of the hidden neurons in the layer above i, since 
causality is reversed from the previous case; the input layer causes activity 
in the hidden layer above, see Fig. [2] 

Using this method, the hidden layers learn so that by observing a period 
of hidden layer activity, the activity of the layer below at the beginning of 
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Figure 3: (a) Architecture of the feedforward layered network, (b) Illustration of piecewise 
constant connectivity between two neurons in Eq. ( |18| ) . (c) Rectified linear unit activation 
function for hidden neurons used in this network, (d) Illustration of the spiking activity of 
a hidden neuron, vertical lines indicate the presence of a Dirac delta function, with height 
corresponding to different normalizations of each individual Dirac delta function. 
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that observation can be inferred. The activity of the hidden layer and the 
connectivity between layers acts like, and encodes a short term memory. 

In this demonstration we also include a layer of prediction neurons that 
predict the activity of the input layer at a specified time period in the future. 
These neurons are supervised by the activity of the input layer with the 
corresponding prediction time period delay, see Fig. [2] We also include a 
layer of digit classification neurons that are trained as for a supervised output 
neuron o, see Fig. [2j 

So far we have not needed to specify the dynamics or activation function of 
the hidden neurons in the network in order to develop this learning scheme. 
Spiking neuron models in computational neuroscience are often dynamical 
systems modeled using differential equations [T8] , In contrast, neurons in 
machine learning are typically characterized by an activation function of the 
neuron’s input [6]. Any of these types of neuron models could be employed 
here; however, we choose rectified linear units (ReLUs) that are commonly 
used in deep learning networks [6|. The form we use here is 


A(I) = I, 
= 0, 


I > 0, 

I < 0, (16) 


see Fig. B 


3.1. Weight Update Rules 

We have so far developed rules for learning a value Q 0 to approximate 
Q(o\H); however, we have not yet discussed rules for modifying W that 
are necessary for implementation in a network. Before these rules can be 
determined, the formula for calculating Q a from H and W needs to be chosen. 
The most common choice is to use the product of H and W summed across 
all neurons in the layer below and integrated across time in the case of time 
delay connections. We use this same choice here 



hj{t')wj(t — t')dt ', 


(17) 


where h 3 is a hidden neuron connected to o and Wj is the corresponding 
connection between them. Other choices are possible and may have advan¬ 
tages over this choice, though this is left for future investigation. We also 
need to choose a parametrization for W. A wide variety of choices could 
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be made here such as sums of continuous functions, or convolution kernels 
acting across different sets of j, as is done in convolutional neural networks 
by modifying (17) to include a convolution across j as well as t. However as a 
first demonstration of this learning scheme we make a simpler choice of using 
a piecewise constant function (see Fig. [3]3) that is easy to conceptualize and 
produces simple learning rules for the weight parameter updates 


K 

Wj(t ) = [S(t + (k - l)r) - S(t + hr )}, 

k= 1 


(18) 


where S is the Heaviside step function, r defines the width of each of the K 
pieces of Wj, and the cOk are modified by learning. We simplify this notation 
to use 

w jk = Uk [S(t + (k - l)r) - S(t + kr )], (19) 

where Wjk are effectively the time delayed weights in the network. For time 
delays greater than tK, the connectivity weight is zero, meaning that only 
activity histories H of length tK are used in calculating Q 0 . 

Assuming H is composed of spikes modeled as delta functions, Eq. (17) 
becomes a sum of weights multiplied by the numbers of spikes 


QM = J2 Wjkhjtf, (20) 

The following simple and fast weight update rules satisfy Eqs. (14) and ( [l5| ), 
though other choices are possible. A weight update rule d that implements 
D when H occurs is 

d(w) = w — ehQ, (21) 


and a corresponding weight update rule u that implements U when supervi¬ 
sion spikes o occur is 

u(w) = w + eho, (22) 


where e is a hyperparameter of the learning rules and should be chosen to be 
appropriately small. These learning rules cause Q a to fluctuate within a small 
range of Q(o\H) and it may be useful to change e with time to allow a initial 
period of fast convergence from the initialization point, and then a reduced 
fluctuation error once Q 0 ~ Q(o\H). Again, these rules are not specific to 
the ReLUs that we demonstrate with, these neurons could be replaced with 
sigmoid units, for example, without changing these weight update rules. 

We use the same weight update rules for learning to predict the activity of 
the input layer from the activity of the hidden layers, where during learning 
the prediction neurons are supervised by the future input, see Fig. [2j 
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3.2. DVS MNIST Event Based Dataset 

To demonstrate this learning scheme we use a dataset collected using 
a Dynamic Vision Sensor (DVS) [T6]. The DVS is a type of video camera 
that collects event data, unlike conventional video cameras that collect frame 
data. In the camera an event is triggered by the light intensity impinging 
upon a pixel changing above a threshold amount. Upon such an event, the 
camera outputs the pixel coordinates, a timestamp (in /as) and the polarity 
of the change in intensity. 

The MNIST database [15] has been used extensively in the development 
of deep learning [6j. With the view of linking this work to previous work in 
deep learning, we demonstrate this learning scheme using a DVS version of 
the MNIST database p3] 15] in which the handwritten digits are displayed 
and moved on an LCD screen that is being recorded by a DVS camera. In this 
dataset the light intensity changes collected by the DVS camera are primarily 
edges of the moving MNIST digits; however, in general the camera also 
captures other scene changes such as changes in illumination. The resulting 
dataset is noisy. Viewing the recorded data reveals that the edges are often 
blurred, and the number of events captured is not uniform across a digit’s 
edges. The dataset also appears to contain some events that are not related 
to the movement of the MNIST digit on the LCD screen; however, these 
events are relatively few in number. The dataset contains recordings of 1000 
handwritten digits for each integer from 0 to 9. We use the first 900 entries 
for training and the last 100 entries for testing. The DVS’s 128 x 128 array 
of pixels is cropped down to 23 x 23 pixels with each of these pixels mapped 
onto two input neurons, one for each polarity of light intensity change. The 
input training sequence was formed from a random selection of the individual 
MNIST digit sequences each separated by 15 timesteps or 75 ms of no input. 
Each individual MNIST event sequence has a duration of about 77 timesteps 
or about 2.3 s. 

Each pair of neurons have five iVk parameters encoding weights for con¬ 
nection delays hr of width 30 ms corresponding to the network’s execution 
timesteps of 30 ms. In this demonstration we predict 15 timesteps or 450ms 
into the future. An additional ten output neurons are used to classify the 
current input as a digit from zero to nine. All connection weights c <j between 
layers were initialized to small random values to the range [0, e] where e was 
initially set to 1 x 10~ 5 . The connection weights for the prediction and clas¬ 
sification neurons were all initialized to zero and used an initial value of e of 
2.5 x 10 -6 , corresponding to the e value for the between layer connections di- 
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vided by the number of layers, since the prediction and classification neurons 
connect to all layers. The connections between hidden layers were trained 
one layer at a time for one pass through the training dataset, corresponding 
to 8.8 x 10 5 timesteps. After each pass through the hidden layers e was de¬ 
creased by half for all connections and training was repeated, beginning at 
the first hidden layer. Note that the initial value, decay and decay period 
for e are not heavily optimized. The prediction and classification weights 
from all hidden layers were trained at every timestep. To demonstrate that 
many ideas used in deep learning are directly transferable to a spiking neural 
network that learns using this scheme, during training we use 50% dropout 
m for each hidden layer. 

The operation of the trained network is demonstrated in Fig. [4j The 
hidden layers are very active since in this demonstration the neurons have 
a threshold fixed at zero. Including a learnable threshold would produce 
a more sparse representation whilst also reducing the required cpu time as 
the network’s operation and learning are both dependent on the number of 
events that occur. The inference of the noisy input is significantly better than 
the prediction since the inference involves a memory of the input whereas 
the prediction does not. However, a smoothed version of the future input 
is usually identifiable in the prediction. The inference and predictions are 
often poor when the digit changes direction as the edges at these points are 
weak and the data are particularly noisy. The classification output correctly 
classifies the input digit 87.41% of the time. The classification error as a 
function of training time is shown in Fig. [5j 

Figure [6] shows receptive fields of neurons from the first hidden layer and 
predictive fields of neurons from all layers. Both positive (excitatory) and 
negative (inhibitory) weights are learnt. Initially all neurons are active and 
the small random weight vectors converge toward a time averaged input vec¬ 
tor. Upon converging toward the time averaged input, the weight vectors are 
nearly identical; however, differences due to the small random initialization 
breaks their symmetry and the weights of different neurons begin to diverge 
toward other more specific features of the input. This process continues 
as these features themselves are further split into other even more specific 
features. After learning is stopped, some of the receptive fields are tuned 
toward responding to a small number of pixels, while others respond to a 
distributed pattern of pixels. Predictive fields tend to be composed of larger 
patches of the sensory held indicating that the encoding of the prediction is 
distributed across many neurons. Without dropout, denoising autoencoding 
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Figure 4: A demonstration of the feedforward network described in the text applied to 
the DVS MNIST dataset, (a) The present input to each neuron. Vertical red lines divide 
layers. Neurons 1 to 1058 are input neurons, thereafter each 1000 neurons form successive 
hidden layers. The number of presently active neurons in each layer are indicated at the 
top of this frame, (b) The activity of the DVS input delayed by 5 timesteps (corresponding 
to the maximum connection delay), (c) The inferred activity of the input Q(i\H) from the 
recent activity of first hidden layer, (d) The activity of the DVS input 15 timesteps into 
the future, (e) The network’s prediction Q(p\H) of the activity of the input 15 timesteps 
into the future, (f) Classification of the present input Q(c\H). (g)-(j) As for (b)-(e) with 
polarity removed by summing the activity of both polarities, (k) Sum of squared errors 
normalized by the sum of squares of the data at each timestep for the inferences and 
predictions in (c) and (e). An additional file is available to view this figure as a movie. 
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Figure 5: Network classification error on the test set vs training time. Connections 
between layers are trained in a sequence from lowest to highest, vertical dashed lines 
indicate the end of each pass through the network, and the points at which e is halved. 
At the end of training the error on the test set is 12.59% 
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Figure 6: (a) Example receptive fields from eight neurons in the first hidden layer, (b)-(e) 
Example predictive fields for neurons in the input layer and hidden layers one to three 
respectively. In all frames each pixel is the sum of all temporal connection weights u> for 
that pixel. All fields have been normalized to have equal maximums. 

or another regularization method, the connectivity between hidden layers 
forms an identity mapping, with each neuron connecting only to a single 
neuron in the previous layer. 

4. Summary 

This paper introduces an event based learning scheme for neural net¬ 
works. The scheme does not depend on the specific form of the neuronal 
dynamics or activation function, and while this paper focuses on training 
spiking neural networks, this scheme may also be used to train traditional 
artificial neural networks, especially those that involve discontinuous activa¬ 
tion functions that defeat gradient descent methods. The scheme may also 
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be applied to networks of neurons containing biologically inspired dynamics. 
Future work in this direction may inform theories of dynamics and learning 
in the brain. The broad applicability of this learning scheme provides an 
avenue to directly apply ideas from both deep learning and computational 
neuroscience and thus strengthen and inform the theoretical progress in both 
fields. 
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